Delhi Metro Network Analysis¶

Metro Network Analysis involves exploring the network of metro systems to understand their structure, efficiency, and effectiveness. We will analyze routes, stations, traffic, connectivity, as well as other operational aspects. We will go through the task of Delhi Metro Network Analysis using Python.

Analyzing the metro network in a city helps improve urban transportation infrastructure, helping better plan the city and enhance commuter experiences. Floowing is the process we can use for the task of Metro Network Analysis of Delhi:

  1. Identify your objectives. It could be optimizing routes, reducing congestion, improving passenger flow, or understanding travel patterns.
  2. Collect data on metro lines, stations, connections, and transit schedules.
  3. Clean the data for inconsistencies, missing values, or errors.
  4. Create visual representations of the network, such as route maps, passenger flow charts, or heat maps of station congestion.
  5. Analyze how effectively the network handles passenger traffic and meets operational targets.

So, for the analysis of Delhi Metro Network, we need to have a dataset based on all metro lines in Delhi and how they connect with each other. I found an ideal dataset for this task. You can download the dataset from here.

Let’s get started with the task of Delhi Metro Network Analysis by importing the necessary Python libraries and the dataset:¶
In [1]:
import pandas as pd
import folium
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.templates.default = "plotly_white"

metro_data = pd.read_csv("Delhi-Metro-Network.csv")

Now let's examine the dataset for any missing or null values. then we take a look on the data types

In [2]:
# checking for missing values
missing_values = metro_data.isnull().sum()
print(missing_values)
Station ID                  0
Station Name                0
Distance from Start (km)    0
Line                        0
Opening Date                0
Station Layout              0
Latitude                    0
Longitude                   0
dtype: int64
In [3]:
# checking data types
data_types = metro_data.dtypes
print(data_types)
Station ID                    int64
Station Name                 object
Distance from Start (km)    float64
Line                         object
Opening Date                 object
Station Layout               object
Latitude                    float64
Longitude                   float64
dtype: object
In [4]:
#we can also use info() method to get the same details
metro_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 285 entries, 0 to 284
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Station ID                285 non-null    int64  
 1   Station Name              285 non-null    object 
 2   Distance from Start (km)  285 non-null    float64
 3   Line                      285 non-null    object 
 4   Opening Date              285 non-null    object 
 5   Station Layout            285 non-null    object 
 6   Latitude                  285 non-null    float64
 7   Longitude                 285 non-null    float64
dtypes: float64(3), int64(1), object(4)
memory usage: 17.9+ KB

As seen, the Opening Date is set to object, we will change that to datetime format

In [5]:
# converting 'Opening Date' to datetime format
metro_data['Opening Date'] = pd.to_datetime(metro_data['Opening Date'])
metro_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 285 entries, 0 to 284
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Station ID                285 non-null    int64         
 1   Station Name              285 non-null    object        
 2   Distance from Start (km)  285 non-null    float64       
 3   Line                      285 non-null    object        
 4   Opening Date              285 non-null    datetime64[ns]
 5   Station Layout            285 non-null    object        
 6   Latitude                  285 non-null    float64       
 7   Longitude                 285 non-null    float64       
dtypes: datetime64[ns](1), float64(3), int64(1), object(3)
memory usage: 17.9+ KB

Geospatial Analysis¶

First of all, visualizing the locations of the metro stations on a map will give a clear idea about what we are doing. It will give us an insight into the geographical distribution of the stations across Delhi. We will use the latitude and longitude data to plot each station.

For this, I’ll create a map with markers for each metro station. Each marker will represent a station, and we’ll be able to analyze aspects like station density and geographic spread. Let’s proceed with this visualization:

In [6]:
# defining a color scheme for the metro lines
line_colors = {
    'Red line': 'red',
    'Blue line': 'blue',
    'Yellow line': 'beige',
    'Green line': 'green',
    'Voilet line': 'purple',
    'Pink line': 'pink',
    'Magenta line': 'darkred',
    'Orange line': 'orange',
    'Rapid Metro': 'cadetblue',
    'Aqua line': 'black',
    'Green line branch': 'lightgreen',
    'Blue line branch': 'lightblue',
    'Gray line': 'lightgray'
}

delhi_map_with_line_tooltip = folium.Map(location=[28.7041, 77.1025], zoom_start=10)

# adding colored markers for each metro station with line name in tooltip
for index, row in metro_data.iterrows():
    line = row['Line']
    color = line_colors.get(line, 'black')  # Default color is black if line not found in the dictionary
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=f"{row['Station Name']}",
        tooltip=f"{row['Station Name']}, {line}",
        icon=folium.Icon(color=color)
    ).add_to(delhi_map_with_line_tooltip)

# Displaying the updated map
delhi_map_with_line_tooltip
Out[6]:
Make this Notebook Trusted to load map: File -> Trust Notebook

The above map shows the geographical distribution of Delhi Metro stations. Each marker represents a metro station. This map provides a visual explination of how the metro stations are spread across Delhi.

Temporal Analysis¶

Now, we will address the growth of the Delhi Metro network over time: how many stations were opened each year and then visualize this growth. It can provide insights into the pace of metro network expansion and its development phases.

We will do the following:

  1. Extracting the year from the Opening Date
  2. Count the number of stations opened each year.
  3. Visualize this information in a bar plot.

Let’s proceed with this analysis:

In [7]:
metro_data['Opening Year'] = metro_data['Opening Date'].dt.year

# counting the number of stations opened each year
stations_per_year = metro_data['Opening Year'].value_counts().sort_index()

stations_per_year_df = stations_per_year.reset_index()
stations_per_year_df.columns = ['Year', 'Number of Stations']

fig = px.bar(stations_per_year_df, x='Year', y='Number of Stations',
             title="Number of Metro Stations Opened Each Year in Delhi",
             labels={'Year': 'Year', 'Number of Stations': 'Number of Stations Opened'})

fig.update_layout(xaxis_tickangle=-45, xaxis=dict(tickmode='linear'),
                  yaxis=dict(title='Number of Stations Opened'),
                  xaxis_title="Year")

fig.show()

The chart displays the number of stations opened per year. We can clearly notice that:

  • Few years have a large number of opened station.
  • Few years have few or no new stations.

Line Analysis¶

Now, we will focus on analyzing the metro lines in regard to the number of stations and the average distance between stations. This will shade light into each metro line characteristics, such as which line has more stations.

However, we would need to calculate the number of stations per line as well as the average distance between stations per line. Hence, the results will be visualized to provide better understanding.

Let's begin:

In [8]:
stations_per_line = metro_data['Line'].value_counts()

# calculating the total distance of each metro line (max distance from start)
total_distance_per_line = metro_data.groupby('Line')['Distance from Start (km)'].max()

avg_distance_per_line = total_distance_per_line / (stations_per_line - 1)

line_analysis = pd.DataFrame({
    'Line': stations_per_line.index,
    'Number of Stations': stations_per_line.values,
    'Average Distance Between Stations (km)': avg_distance_per_line
})

# sorting the DataFrame by the number of stations
line_analysis = line_analysis.sort_values(by='Number of Stations', ascending=False)

line_analysis.reset_index(drop=True, inplace=True)
print(line_analysis)
                 Line  Number of Stations  \
0           Blue line                  49   
1           Pink line                  38   
2         Yellow line                  37   
3         Voilet line                  34   
4            Red line                  29   
5        Magenta line                  25   
6           Aqua line                  21   
7          Green line                  21   
8         Rapid Metro                  11   
9    Blue line branch                   8   
10        Orange line                   6   
11          Gray line                   3   
12  Green line branch                   3   

    Average Distance Between Stations (km)  
0                                 1.355000  
1                                 1.097917  
2                                 1.157143  
3                                 1.950000  
4                                 1.240000  
5                                 1.050000  
6                                 1.379167  
7                                 4.160000  
8                                 1.421622  
9                                 1.000000  
10                                1.167857  
11                                1.318182  
12                                1.269444  
In [9]:
# creating subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=('Number of Stations Per Metro Line',
                                                    'Average Distance Between Stations Per Metro Line'),
                    horizontal_spacing=0.2)

# plot for Number of Stations per Line
fig.add_trace(
    go.Bar(y=line_analysis['Line'], x=line_analysis['Number of Stations'],
           orientation='h', name='Number of Stations', marker_color='crimson'),
    row=1, col=1
)

# plot for Average Distance Between Stations
fig.add_trace(
    go.Bar(y=line_analysis['Line'], x=line_analysis['Average Distance Between Stations (km)'],
           orientation='h', name='Average Distance (km)', marker_color='navy'),
    row=1, col=2
)

# update xaxis properties
fig.update_xaxes(title_text="Number of Stations", row=1, col=1)
fig.update_xaxes(title_text="Average Distance Between Stations (km)", row=1, col=2)

# update yaxis properties
fig.update_yaxes(title_text="Metro Line", row=1, col=1)
fig.update_yaxes(title_text="", row=1, col=2)

# update layout
fig.update_layout(height=600, width=1200, title_text="Metro Line Analysis", template="plotly_white")

fig.show()

The table presents a detailed analysis of the Delhi Metro lines, including the number of stations on each line and the average distance between stations.

The bar chart gives better view of the table details. It has two sides: one for the number of stations per line and another for the average distance between stations.

Station Layout Analysis¶

Now, it is time to explore the layouts of the stations (Elevated, Ground Level, Underground). We need to see if there are any patterns for such layouts. To do so, we will calcuate the frequency of each layout type and visualize them to have a better undestanding of their distribution.

Let's begin:

In [10]:
layout_counts = metro_data['Station Layout'].value_counts()

# creating the bar plot using Plotly
fig = px.bar(x=layout_counts.index, y=layout_counts.values,
             labels={'x': 'Station Layout', 'y': 'Number of Stations'},
             title='Distribution of Delhi Metro Station Layouts',
             color=layout_counts.index,
             color_continuous_scale='pastel')

# updating layout for better presentation
fig.update_layout(xaxis_title="Station Layout",
                  yaxis_title="Number of Stations",
                  coloraxis_showscale=False,
                  template="plotly_white")

fig.show()

The bar chart and the counts show the distribution of different station layouts in the Delhi Metro network. We can conclude that:

  • Elevated Stations: The majority of the stations are Elevated. It is a common design choice in urban areas to save space and reduce land acquisition issues.
  • Underground Stations: The Underground stations are fewer compared to elevated ones. These are likely in densely populated or central areas where above-ground construction is less feasible.
  • At-Grade Stations: There are only a few At-Grade (ground level) stations, suggesting they are less common in the network, possibly due to land and traffic considerations.

Summary¶

We have gone through the process of analyzing Delhi Metro Network. We tried to understand the main structure points, efficiency and effectiveness using the analysis of routes, stations, distance among other operational aspects

In [ ]: